Probabilistic Graphical Models
We introduce the concept of probabilistic graphical models (PGMs) as a probabilistic model for representing the conditional dependence structure between random variables. Some of the most common PGMs are Markov Random Fields and Bayesian Networks
We would like to explore inference of PGMs, let says:
- xE= The observed evidence
- xF= The unobserved variables we want to infer
- xR=x−{xE,xF}= Remaining variables, extraneous to query
Then we have p(xF∣xE)=p(xE)p(xF,xE)=∑xFp(xF,xE)p(xF,xE) and the p(xF,xE)=∑xRp(xF,xE,xR) by marginalization.
We also define p(x1:T)=∏t=1Tp(xt∣xt−1,…,x1) where p(x1∣x0)=p(x1)
Variable Elimination
We also have some other ways to do marginalization and the ways we do marginalization affects the computational costs. Such tools called variable elimination
- A simple and general exact inference algorithm in any PGM (e.g. MRFs, BNs, etc.)
- Dynamic programming avoids enumerating all variables assignments
VE applied to Trees
Recall a graph G=(V,E), the joint distribution p(x1:n)=∏i∈Vψ(xi)∏(i,j)∈Eψ(xi,xj).
We define the message sent from j to i∈N(j) is mj→i(xi)=∑xjψj(xj)ψij(xi,xj)∏k∈N(j)∖{i}mk→j(xj). If xj is observed, the message is mj→i(xi)=ψj(xj)ψij(xi,xj)∏k∈N(j)∖{i}mk→j(xj).
Then we define the belief as b(xi)∝ψi(xi)∏j∈N(i)mj→i(xi).
Once normalized, beliefs are the marginals we want to compute.
Belief Propagation
We define Belief Propagation as a message-passing between neighboring vertices of the graph. We have belief propagation algorithm as follows:
- Choose root r arbitrarily
- Pass messages from leaves to r
- Pass messages from r to leaves
- These two passes are sufficient on trees
- Compute beliefs b(xi)
Or compute them in two steps:
- Compute unnormalized beliefs b~(xi)=ψi(xi)∏j∈N(i)mj→i(xi)
- Normalize beliefs b(xi)=∑xib~(xi)b~(xi)
VE applied to MRFs and BNs
Before talking such algorithms for MRFs and BNs, we would like to define following terms:
- Introduce nonnegative factor ϕ
- Marginalizing over X we introduce a new factor, denote by τ
Then we would like to define the sum-product algorithm where p(xF∣xE)∝τ(xF,xE)=∑xR∏C∈FϕC(xC) where F is the set of potentials or factors.
- For DAGs(BNs), F is the set of the forms {i}∪Parent(i),∀i
- For MRFs, F is given by the set of all maximal cliques
The complexity of the Variable Elimination algorithm is O(mkNmax)
- m is the number of initial factors (i.e. m=∣F∣)
- k is the number of states each random variable takes (assumed to be equal here)
- Ni is the number of random variables inside each sum ∑i
- Nmax=maxiNi is the maximum number of random variables inside the largest sum ∑i
Belief Propagation on MRFs
If such graph we want to compute is not a tree and have cycles, we would like to keep passing messages until convergence which called Loopy Belief Propagation. (but result is an approximation to exact marginal)
Loopy BP algorithm (may not converge):
- Initialize messages uniformly: mi→j(xj)=[1/k,…,1/k]T
- Keep running BP updates until it converges: mj→i(xi)=∑xjψj(xj)ψij(xi,xj)∏k∈N(j)∖{i}mk→j(xj) and normalize for stability
- Compute beliefs b(xi)∝ψi(xi)∏j∈N(i)mj→i(xi)
MAP Inference over BP
We update BP take the form mj→i(xi)=maxxjψj(xj)ψij(xi,xj)∏k∈N(j)∖{i}mk→j(xj), and get he beliefs (max-marginals) b(xi)∝ψi(xi)∏j∈N(i)mj→i(xi). The MAP inference is x^i=argmaxxib(xi).